88 research outputs found

    Singularity avoidance for collapsing quantum dust in the Lemaitre-Tolman-Bondi model

    Full text link
    We investigate the fate of the classical singularity in a collapsing dust cloud. For this purpose, we quantize the marginally bound Lemaitre-Tolman-Bondi model for spherically-symmetric dust collapse by considering each dust shell in the cloud individually, taking the outermost shell as a representative. Because the dust naturally provides a preferred notion of time, we can construct a quantum mechanical model for this shell and demand unitary evolution for wave packets. It turns out that the classical singularity can generically be avoided provided the quantization ambiguities fulfill some weak conditions. We demonstrate that the collapse to a singularity is replaced by a bounce followed by an expansion. We finally construct a quantum corrected spacetime describing bouncing dust collapse and calculate the time from collapse to expansion.Comment: 20 pages, 2 figure

    MulTe: A Multi-Tenancy Database Benchmark Framework

    Get PDF
    Multi-tenancy in relational databases has been a topic of interest for a couple of years. On the one hand, ever increasing capabilities and capacities of modern hardware easily allow for multiple database applications to share one system. On the other hand, cloud computing leads to outsourcing of many applications to service architectures, which in turn leads to offerings for relational databases in the cloud, as well. The ability to benchmark multi-tenancy database systems (MT-DBMSs) is imperative to evaluate and compare systems and helps to reveal otherwise unnoticed shortcomings. With several tenants sharing a MT-DBMS, a benchmark is considerably different compared to classic database benchmarks and calls for new benchmarking methods and performance metrics. Unfortunately, there is no single, well-accepted multi-tenancy benchmark for MT-DBMSs available and few efforts have been made regarding the methodology and general tooling of the process. We propose a method to benchmark MT-DBMSs and provide a framework for building such benchmarks. To support the cumbersome process of defining and generating tenants, loading and querying their data, and analyzing the results we propose and provide MULTE, an open-source framework that helps with all these steps

    Cardinality estimation in ETL processes

    Get PDF
    The cardinality estimation in ETL processes is particularly difficult. Aside from the well-known SQL operators, which are also used in ETL processes, there are a variety of operators without exact counterparts in the relational world. In addition to those, we find operators that support very specific data integration aspects. For such operators, there are no well-examined statistic approaches for cardinality estimations. Therefore, we propose a black-box approach and estimate the cardinality using a set of statistic models for each operator. We discuss different model granularities and develop an adaptive cardinality estimation framework for ETL processes. We map the abstract model operators to specific statistic learning approaches (regression, decision trees, support vector machines, etc.) and evaluate our cardinality estimations in an extensive experimental study

    Penalized Graph Partitioning based Allocation Strategy for Database-as-a-Service Systems

    Get PDF
    Databases as a service (DBaaS) transfer the advantages of cloud computing to data management systems, which is important for the big data era. The allocation in a DBaaS system, i.e., the mapping from databases to nodes of the infrastructure, influences performance, utilization, and cost-effectiveness of the system. Modeling databases and the underlying infrastructure as weighted graphs and using graph partitioning and mapping algorithms yields an allocation strategy. However, graph partitioning assumes that individual vertex weights add up (linearly) to partition weights. In reality, performance does usually not scale linearly with the amount of work due to contention on the hardware, on operating system resources, or on DBMS components. To overcome this issue, we propose an allocation strategy based on penalized graph partitioning in this paper. We show how existing algorithms can be modified for graphs with non-linear partition weights, i.e., vertex weights that do not sum up linearly to partition weights. We experimentally evaluate our allocation strategy in a DBaaS system with 1,000 databases on 32 nodes

    Allocation Strategies for Data-Oriented Architectures

    Get PDF
    Data orientation is a common design principle in distributed data management systems. In contrast to process-oriented or transaction-oriented system designs, data-oriented architectures are based on data locality and function shipping. The tight coupling of data and processing thereon is implemented in different systems in a variety of application scenarios such as data analysis, database-as-a-service, and data management on multiprocessor systems. Data-oriented systems, i.e., systems that implement a data-oriented architecture, bundle data and operations together in tasks which are processed locally on the nodes of the distributed system. Allocation strategies, i.e., methods that decide the mapping from tasks to nodes, are core components in data-oriented systems. Good allocation strategies can lead to balanced systems while bad allocation strategies cause skew in the load and therefore suboptimal application performance and infrastructure utilization. Optimal allocation strategies are hard to find given the complexity of the systems, the complicated interactions of tasks, and the huge solution space. To ensure the scalability of data-oriented systems and to keep them manageable with hundreds of thousands of tasks, thousands of nodes, and dynamic workloads, fast and reliable allocation strategies are mandatory. In this thesis, we develop novel allocation strategies for data-oriented systems based on graph partitioning algorithms. Therefore, we show that systems from different application scenarios with different abstraction levels can be generalized to generic infrastructure and workload descriptions. We use weighted graph representations to model infrastructures with bounded and unbounded, i.e., overcommited, resources and possibly non-linear performance characteristics. Based on our generalized infrastructure and workload model, we formalize the allocation problem, which seeks valid and balanced allocations that minimize communication. Our allocation strategies partition the workload graph using solution heuristics that work with single and multiple vertex weights. Novel extensions to these solution heuristics can be used to balance penalized and secondary graph partition weights. These extensions enable the allocation strategies to handle infrastructures with non-linear performance behavior. On top of the basic algorithms, we propose methods to incorporate heterogeneous infrastructures and to react to changing workloads and infrastructures by incrementally updating the partitioning. We evaluate all components of our allocation strategy algorithms and show their applicability and scalability with synthetic workload graphs. In end-to-end--performance experiments in two actual data-oriented systems, a database-as-a-service system and a database management system for multiprocessor systems, we prove that our allocation strategies outperform alternative state-of-the-art methods

    Pairwise Element Computation with MapReduce

    Get PDF
    In this paper, we present a parallel method to evaluate functions on pairs of elements. It is a challenge to partition the Cartesian product of a set with itself in order to parallelize the function evaluation on all pairs. Our solution uses (a) replication of set elements to allow for partitioning and (b) aggregation of the results gathered for different copies of an element. Based on an execution model with nodes that execute tasks on local data without online communication, we present a generic algorithm and show how it can be implemented with MapReduce. Three different distribution schemes that define the partitioning of the Cartesian product are introduced, compared, and evaluated. Any one of the distribution schemes can be used to derive and implement a specific algorithm for parallel pairwise element computation

    The Time has Come – Application of Artificial Intelligence in Small- and Medium-Sized Enterprises

    Get PDF
    Artificial intelligence (AI) is not yet widely used in small- and medium-sized industrial enterprises (SME). The reasons for this are manifold and range from not understanding use cases, not enough trained employees, to too little data. This article presents a successful design-oriented case study at a medium-sized company, where the described reasons are present. In this study, future demand forecasts are generated based on historical demand data for products at a material number level using a gradient boosting machine (GBM). An improvement of 15% on the status quo (i.e. based on the root mean squared error) could be achieved with rather simple techniques. Hence, the motivation, the method, and the first results are presented. Concluding challenges, from which practical users should derive learning experiences and impulses for their own projects, are addressed

    Scalable frequent itemset mining on many-core processors

    Get PDF
    Frequent-itemset mining is an essential part of the association rule mining process, which has many application areas. It is a computation and memory intensive task with many opportunities for optimization. Many efficient sequential and parallel algorithms were proposed in the recent years. Most of the parallel algorithms, however, cannot cope with the huge number of threads that are provided by large multiprocessor or many-core systems. In this paper, we provide a highly parallel version of the well-known Eclat algorithm. It runs on both, multiprocessor systems and many-core coprocessors, and scales well up to a very large number of threads---244 in our experiments. To evaluate mcEclat's performance, we conducted many experiments on realistic datasets. mcEclat achieves high speedups of up to 11.5x and 100x on a 12-core multiprocessor system and a 61-core Xeon Phi many-core coprocessor, respectively. Furthermore, mcEclat is competitive with highly optimized existing frequent-itemset mining implementations taken from the FIMI repository

    A Query, a Minute: Evaluating Performance Isolation in Cloud Databases

    Get PDF
    Several cloud providers offer reltional databases as part of their portfolio. It is however not obvious how resource virtualization and sharing, which is inherent to cloud computing, influence performance and predictability of these cloud databases. Cloud providers give little to no guarantees for consistent execution or isolation from other users. To evaluate the performance isolation capabilities of two commercial cloud databases, we ran a series of experiments over the course of a week (a query, a minute) and report variations in query response times. As a baseline, we ran the same experiments on a dedicated server in our data center. The results show that in the cloud single outliers are up to 31 times slower than the average. Additionally, one can see a point in time after which the average performance of all executed queries improves by 38 %

    pcApriori: Scalable apriori for multiprocessor systems

    Get PDF
    Frequent-itemset mining is an important part of data mining. It is a computational and memory intensive task and has a large number of scientific and statistical application areas. In many of them, the datasets can easily grow up to tens or even several hundred gigabytes of data. Hence, efficient algorithms are required to process such amounts of data. In the recent years, there have been proposed many efficient sequential mining algorithms, which however cannot exploit current and future systems providing large degrees of parallelism. Contrary, the number of parallel frequent-itemset mining algorithms is rather small and most of them do not scale well as the number of threads is largely increased. In this paper, we present a highly-scalable mining algorithm that is based on the well-known Apriori algorithm; it is optimized for processing very large datasets on multiprocessor systems. The key idea of pcApriori is to employ a modified producer--consumer processing scheme, which partitions the data during processing and distributes it to the available threads. We conduct many experiments on large datasets. pcApriori scales almost linear on our test system comprising 32 cores
    • …
    corecore